进化树构建 -- NJ法 -- rapidnj
一.简介
rapidnj是一个高效的NJ树构建工具。它使用了一些优化技巧,使其比传统的NJ方法更快。
rapidnj可以处理非常大的距离矩阵,这对于处理大规模的遗传数据或比较大量的序列非常有用。同时,支持bootstrap重采样,这是一种评估树分支可靠性的方法。通过多次重采样数据并构建多个树,可以为原始树的每个分支提供一个可靠性评分。软件支持多种常见的距离矩阵格式,如PHYLIP和FASTA等。这使得它可以与其他生物信息学工具轻松集成。
二.安装
下载软件rapidnj压缩包
wget https://github.com/somme89/rapidNJ/archive/refs/tags/latest.zip
解压缩
unzip latest.zip
编译和安装
cd rapidNJ-latest
make
给予可执行文件执行权限:
chmod +x rapidNJ-latest/bin/rapidnj
三.使用
3.1 程序参数
/share/nas1/yuj/software/rapidNJ_2.3.3/bin/rapidnj -h
Rapid neighbour-joining. An implementation of the canonical neighbour-joining method which utilize a fast search heuristic to reduce the running time. RapidNJ can be used to reconstruct large trees using a very small amount of memory by utilizing the HDD as storage.
USAGE: rapidnj INPUT [OPTIONS]
The INPUT can be a distance matrix in phylip (.phylip) format or a multiple alignment in stockholm (.sth) or phylip format (.phylip).
OPTIONS:
-h, --help display this help message and exit.
-v, --verbose turn on verbose output.
-i, --input-format ARG Specifies the type of input. pd = distance
matrix in phylip format, sth = multiple alignment in (single line) stockholm format.
fa = multiple alignment in (single line) FASTA format.
-o, --output-format ARG Specifies the type of output. t = phylogenetic tree in newick format
(default), m = distance matrix.
-a, --evolution-model ARG Specifies which sequence evolution method to use when computing
distance estimates from multiple alignments. jc = juke cantor,
kim = Kimura's distance (default).
-m, --memory-size The maximum amount of memory which rapidNJ is allowed to use (in MB).
Default is 90% of all available memory.
-k, --rapidnj-mem ARG Force RapidNJ to use a memory efficient version of rapidNJ. The 'arg'
specifies the percentage of a sorted distance matrix which should be
stored in memory (arg=10 means 10%).
-d, --rapidnj-disk ARG Force RapidNJ to use HDD caching where 'arg' is the directory used to
store cached files.
-c, --cores ARG Number of cores to use for computating distance matrices from multiple
alignments. All available cores are used by default.
-b --bootstrap ARG Compute bootstrap values using ARG samples. The output tree will be
annotated with the bootstrap values.
-t, --alignment-type ARG Force the input alignment to be treated as: p = protein alignment,
d = DNA alignment.
-n --no-negative-length Adjust for negative branch lengths.
-x --output-file ARG Output the result to this file instead of stdout.
3.2 运行
/share/nas1/yuj/software/rapidNJ_2.3.3/bin/rapidnj -i fa input.fa -b 1000 > phytree.nwk
- -i fa: 指定输入文件的格式为fasta。
- -b 1000: 进行1000次bootstrap重采样。bootstrap是一种评估树分支的可靠性的方法。
等到100%即可